Search CORE

13 research outputs found

Soundbite Detection in Broadcast News Domain

Author: Hirschberg Julia Bell
Maskey Sameer R.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

In this paper, we present results of a study designed to identify SOUNDBITES in Broadcast News. We describe a Conditional Random Field-based model for the detection of these included speech segments uttered by individuals who are interviewed or who are the subject of a news story. Our goal is to identify direct quotations in spoken corpora which can be directly attributable to particular individuals, as well as to associate these soundbites with their speakers. We frame soundbite detection as a binary classification problem in which each turn is categorized either as a soundbite or not. We use lexical, acoustic/prosodic and structural features on a turn level to train a CRF. We performed a 10-fold cross validation experiment in which we obtained an accuracy of 67.4 % and an F-measure of 0.566 which is 20.9 % and 38.6 % higher than a chance baseline. Index Terms: soundbite detection, speaker roles, speech summarization, information extraction

CiteSeerX

Columbia University Academic Commons

Summarizing Speech Without Text Using Hidden Markov Models

Author: Hirschberg Julia Bell
Maskey Sameer R.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

We present a method for summarizing speech documents without using any type of transcript/text in a Hidden Markov Model framework. The hidden variables or states in the model represent whether a sentence is to be included in a summary or not, and the acoustic/prosodic features are the observation vectors. The model predicts the optimal sequence of segments that best summarize the document. We evaluate our method by comparing the predicted summary with one generated by a human summarizer. Our results indicate that we can generate 'good' summaries even when using only acoustic/prosodic information, which points toward the possibility of text-independent summarization for spoken documents

CiteSeerX

Crossref

Columbia University Academic Commons

A Phrase-Level Machine Translation Approach For Disfluency Detection Using Weighted Finite State Transducers

Author: Gao Yuqing
Maskey Sameer R.
Zhou Bowen
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

We propose a novel algorithm to detect disfluency in speech by reformulating the problem as phrase-level statistical machine translation using weighted finite state transducers. We approach the task as translation of noisy speech to clean speech. We simplify our translation framework such that it does not require fertility and alignment models. We tested our model on the Switchboard disfluency-annotated corpus. Using an optimized decoder that is developed for phrase-based translation at IBM, we are able to detect repeats, repairs and filled pauses for more than a thousand sentences in less than a second with encouraging results. Index Terms: disfluency detection, machine translation, speech-to-speech translation

CiteSeerX

Columbia University Academic Commons

Recommended from our members

Intonational Phrases for Speech Summarization

Author: Hirschberg Julia Bell
Maskey Sameer R.
Rosenberg Andrew
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2008
Field of study

Extractive speech summarization approaches select relevant segments of spoken documents and concatenate them to generate a summary. The extraction unit chosen, whether a sentence, syntactic constituent, or other segment, has a signiﬁcant impact on the overall quality and ﬂuency of the summary. Even though sentences tend to be the choice of most the extractive speech summarizers, in this paper, we present the results of an empirical study indicating that intonational phrases are better units of extraction for summarization. Our study compared four types of input segmentation: sentences, two pause-based segmentation, and intonational phrases (IP). We found that IPs are the best candidates for extractive summarization, improving over the second highest-performing approach, sentence-based summarization, by 8.2% F-measure

Columbia University Academic Commons

Improved Name-Recognition with Meta-data Dependent Name Networks

Author: Bacchiani Michiel
Maskey Sameer R.
Roark Brian
Sproat Richard
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2004
Field of study

A transcription system that requires accurate general name transcription is faced with the problem of covering the large number of names it may encounter. Without any prior knowledge, this requires a large increase in the size and complexity of the system due to the expansion of the lexicon. Furthermore, this increase will adversely affect the system performance due to the increased confusability. Here we propose a method that uses meta-data, available at runtime to ensure better name coverage without significantly increasing the system complexity. We tested this approach on a voicemail transcription task and assumed meta-data to be available in the form of a caller ID string (as it would show up on a caller ID enabled phone) and the name of the mailbox owner. Networks representing possible spoken realization of those names are generated at runtime and included in network of the decoder. The decoder network is built at training time using a class-dependent language model, with caller and mailbox name instances modeled as class tokens. The class tokens are replaced at test time with the name networks built from the meta-data. The proposed algorithm showed a reduction in the error rate of name tokens of 22.1%

CiteSeerX

Columbia University Academic Commons

From Text to Speech Summarization

Author: Galley Michel
Hirschberg Julia Bell
Maskey Sameer R.
McKeown Kathleen
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2005
Field of study

In this paper, we present approaches used in text summarization, showing how they can be adapted for speech summarization and where they fall short. Informal style and apparent lack of structure in speech mean that the typical approaches used for text summarization must be extended for use with speech. We illustrate how features derived from speech can help determine summary content within two ongoing summarization projects at Columbia University

CiteSeerX

Columbia University Academic Commons